23 research outputs found

    Dynamic power dissipation formulation for application in dynamic programming buffer insertion algorithm

    Get PDF
    Buffer insertion is a very effective technique to reduce propagation delay in nano-metre VLSI interconnects. There are two techniques for buffer insertion which are: (1) closed-form solution and (2) dynamic programming. Buffer insertion algorithm using dynamic programming is more useful than the closed-form solution as it allows the use of multiple buffer types and it can be used in tree structured interconnects. As design dimension shrinks, more buffers are needed to improve timing performance. However, the buffer itself consumes power and it has been shown that power dissipation of buffers is significant. Although there are many buffer insertion algorithms that were able to optimize propagation delay with power constraint, most of them used the closed-form solution. Hence, in this paper, we present a formulation to compute dynamic power dissipation of buffers for application in dynamic programming buffer insertion algorithm. The proposed formulation allows dynamic power dissipation of buffers to be computed incrementally. The technique is validated by comparing the formulation with the standard closed-form dynamic power equation. The advantage of the proposed formulation is demonstrated through a series of experiments where it is applied in van Ginneken’s algorithm. The results show that the output of the proposed formulation is consistent with the standard closed-form formulation. Furthermore, it also suggests that the proposed formulation is able to compute dynamic power dissipation for buffer insertion algorithm with multiple buffer types

    Reconfigurable Logic Embedded Architecture of Support Vector Machine Linear Kernel

    Get PDF
    Support Vector  Machine  (SVM) is a linear  binary classifier  that  requires a  kernel  function  to  handle  non-linear problems.  Most  previous  SVM  implementations for  embedded systems  in literature were  built  targeting a certain  application; where analyses were done through comparison  with software im- plementations only. The impact  of different  application datasets towards  SVM hardware performance were not analyzed.  In this work,  we propose  a parameterizable linear  kernel  architecture that  is fully pipelined.  It  is prototyped and  analyzed  on Altera Cyclone  IV  platform   and  results  are  verified  with  equivalent software  model.  Further analysis  is  done  on  determining the effect  of  the  number of  features   and  support   vectors  on  the performance of the  hardware architecture. From  our  proposed linear  kernel  implementation, the number of features  determine the maximum  operating frequency  and amount  of logic resource utilization,  whereas  the  number of support   vectors  determines the  amount  of on-chip  memory  usage  and  also the  throughput of the system

    Energy-Aware Network-on-Chip Application Mapping Based on Domain Knowledge Genetic Algorithm

    Get PDF
    This paper addresses energy-aware application mapping for large-scale Network-on-chip (NoC). The increasing number of intellectual property (IP) cores in multi-processor system-on-chips (MPSoCs) makes NoC application mapping more challenging to find optimum core-to-topology mapping. This paper proposes an application mapping technique that incorporates domain knowledge into genetic algorithm (GA) to minimize the energy consumption of NoC communication. The GA is initialized with knowledge on network partition whereas the genetic crossover operator is guided with inter-core communication demands. NoC energy estimation is based on analytical energy model and cycle-accurate Noxim simulation. For large-scale NoC, application mapping using knowledge-based genetic operator saves up to 28% energy compared to the one on conventional GA. Adding knowledge-based initial mapping speeds up convergence by 81% and further saves energy by 5% compared to only knowledge-based crossover GA. Furthermore, cycle-accurate simulations of applications with traffic dependency show the effectiveness of the proposed application mapping for large-scale NoC

    Configurable Version Management Hardware Transactional Memory for Multi-processor Platform

    Get PDF
    Programming on a shared memory multi-processor platforms in an efficient way is difficult as locked based synchronization limits the efficiency. Transactional memory (TM) is a promising approach in creating an abstraction layer for multi-threaded programming. However, the performance of TM is application-specific. In general, the configuration of a TM is divided into version management and conflict management. Each scheme has its strengths and weaknesses depending on executing application. Previous TM implementations for embedded system were built on fixed version management configuration which results in significant performance loss when transaction behaviour changes. In this paper, we propose a hardware transactional memory (HTM) with interchangeable version management. Random requests at different contention levels are used to verify the performance of the proposed TM. The proposed architecture is targeted for embedded applications and is area-efficient compared to current implementations that apply cache coherence protocols

    Application profiling and mapping on NoC-based MPSoC emulation platform on reconfigurable logic

    Get PDF
    In network-on-chip (NoC) based multi-processor system-on-chip (MPSoC) development, application profiling is one of the most crucial step during design time to search and explore optimal mapping. Conventional mapping exploration methodologies analyse application-specific graphs by estimating its runtime behaviour using analytical or simulation models. However, the former does not replicate the actual application run-time performance while the latter requires significant amount of time for exploration. To map applications on a specific MPSoC platform, the application behaviour on cycle-accurate emulated platform should be considered for obtaining better mapping quality. This paper proposes an application mapping methodology that utilizes a MPSoC prototyped in Field-Programmable Gate Array (FPGA). Applications are implemented on homogeneous MPSoC cores and their costs are analysed and profiled on the platform in term of execution time, intra-core communication and inter-core communication delays. These metrics are utilized in analytical evaluation of the application mapping. The proposed analytical-based mapping is demonstrated against the exhaustive brute force method. Results show that the proposed method is able to produce quality mappings compared to the ground truth solutions but in shorter evaluation time

    Performance Evaluation of Centralized Reconfigurable Transmitting Power Scheme in Wireless Network-on-chip

    Get PDF
    Network-on-chip (NoC) is an on-chip communication network that allows parallel communication among all cores to improve inter-core performance. Wireless NoC (WiNoC) introduces long-range and high bandwidth radio frequency (RF) interconnects that can possibly reduce the multi-hop communication of the planar metal interconnects in conventional NoC platforms. In WiNoC, RF transceivers account for a significant power consumption, particularly its transmitter, out of its total communication energy. This paper evaluates the energy and latency performance of a closed loop power management mechanism which enables transmitting power reconfiguration in WiNoC based on number of erroneous received packets. The scheme achieves significant energy savings with limited performance degradation and insignificant impact on throughput

    Hybrid routing tree with buffer insertion under obstacle constraints

    No full text
    Performance optimization in very-large-scale integration (VLSI) design is the key success in today's design automation methodologies. One of the performance issues is the interconnect delay in deep sub-micron VLSI circuits. The interconnect delay becomes more dominant compared to gate delay when the size of the gates is reduced. This paper presents an algorithm to optimize the timing performance of the routing tree under obstacle constraints. It is known that simultaneous routing and buffer insertion is proven to be NP-complete while the two-step approach may produce a poor solution. Therefore, we propose a hybrid algorithm that can modify a given routing tree simultaneously with buffer insertion. This paper describes this algorithm and we present experimental results that show the proposed algorithm can improve the timing of the routing tree significantly with low execution time

    Interleaved incremental/decremental support vector machine for embedded system

    No full text
    Incremental and Decremental Support Vector Machine (IDSVM) is a widely used incremental learning algorithm that is highly accurate but requires high computational complexity. For IDSVM to be deployed in embedded systems, moving window architecture is needed to limit the number of support vectors in the model. This increases the complexity of the system as data need to be unlearned while learning new data. This work proposes an interleaved IDSVM (IIDSVM) architecture that performs incremental and decremental learning simultaneously. This work targets embedded system platform with limited on-chip memory. The proposed solution is able to get an improvement of 60%-70% in terms of speed while producing similar accuracy with IDSVM

    An optimized buffer insertion algorithm with delay-power constraints for VLSI layouts

    No full text
    We propose a grid-graph algorithm for interconnect routing and buffer insertion in nanometer VLSI layout designs. The algorithm is designed to handle multiconstraint optimizations, namely timing performance and power dissipation. The proposed algorithm is called HRTB-LA, which stands for hybrid routing tree and buffer insertion with look-ahead. In recent VLSI designs, interconnect delay has become a dominant factor compared to gate delay. The well-known technique to minimize the interconnect delay is by inserting buffers along the interconnect wires. However, the buffer itself consumes power and it has been shown that power dissipation overhead due to buffer insertions is significantly high. Many methodologies to optimize timing performance with power constraint have been proposed, and no algorithm is based on dynamic programing technique using a grid graph. In addition, most of the algorithms for buffer insertion use a postrouting buffer insertion approach. In the presence of buffer obstacles, these postrouting algorithms may produce poor solutions. On the other hand, the simultaneous routing and buffer insertion algorithm offers a better solution, but it was proven to be NP complete. Hence, our main contribution is an efficient algorithm using a hybrid approach for multiconstraint optimization for multisink nets. The algorithm uses dynamic programming to compute incrementally the interconnect delay and power dissipation of the inserted buffers while an effective runtime is achieved with the aid of novel look-ahead and graph pruning schemes. Experimental results prove that HRTB-LA is able to handle multiconstraint optimizations and produces a solution up to 30% better compared to a postrouting buffer insertion algorithm in comparable runtime
    corecore